33 research outputs found

    Extracting Multilingual Natural-Language Patterns for RDF Predicates

    Full text link
    Abstract. Most knowledge sources on the Data Web were extracted from structured or semi-structured data. Thus, they encompass solely a small fraction of the information available on the document-oriented Web. In this paper, we present BOA, a bootstrapping strategy for ex-tracting RDF from text. The idea behind BOA is to extract natural-language patterns that represent predicates found on the Data Web from unstructured data by using background knowledge from the Data Web. These patterns are used to extract instance knowledge from natural-language text. This knowledge is finally fed back into the Data Web, therewith closing the loop. The approach followed by BOA is quasi inde-pendent of the language in which the corpus is written. We demonstrate our approach by applying it to four different corpora and two different languages. We evaluate BOA on these data sets using DBpedia as back-ground knowledge. Our results show that we can extract several thousand new facts in one iteration with very high accuracy. Moreover, we provide the first multilingual repository of natural-language representations of predicates found on the Data Web.

    Distributed Holistic Clustering on Linked Data

    Full text link
    Link discovery is an active field of research to support data integration in the Web of Data. Due to the huge size and number of available data sources, efficient and effective link discovery is a very challenging task. Common pairwise link discovery approaches do not scale to many sources with very large entity sets. We here propose a distributed holistic approach to link many data sources based on a clustering of entities that represent the same real-world object. Our clustering approach provides a compact and fused representation of entities, and can identify errors in existing links as well as many new links. We support a distributed execution of the clustering approach to achieve faster execution times and scalability for large real-world data sets. We provide a novel gold standard for multi-source clustering, and evaluate our methods with respect to effectiveness and efficiency for large data sets from the geographic and music domains

    DBpedia SPARQL Benchmark – Performance Assessment with Real Queries on Real Data

    Full text link
    Abstract. Triple stores are the backbone of increasingly many Data Web appli-cations. It is thus evident that the performance of those stores is mission critical for individual projects as well as for data integration on the Data Web in gen-eral. Consequently, it is of central importance during the implementation of any of these applications to have a clear picture of the weaknesses and strengths of current triple store implementations. In this paper, we propose a generic SPARQL benchmark creation procedure, which we apply to the DBpedia knowledge base. Previous approaches often compared relational and triple stores and, thus, settled on measuring performance against a relational database which had been con-verted to RDF by using SQL-like queries. In contrast to those approaches, our benchmark is based on queries that were actually issued by humans and applica-tions against existing RDF data not resembling a relational schema. Our generic procedure for benchmark creation is based on query-log mining, clustering and SPARQL feature analysis. We argue that a pure SPARQL benchmark is more use-ful to compare existing triple stores and provide results for the popular triple store implementations Virtuoso, Sesame, Jena-TDB, and BigOWLIM. The subsequent comparison of our results with other benchmark results indicates that the per-formance of triple stores is by far less homogeneous than suggested by previous benchmarks. 1

    Seasonal changes in social networks of giraffes

    Get PDF
    Fission‐fusion social societies allow animals to respond in a flexible manner to environmental changes by adapting the size and composition of a group. Although group members change frequently in these systems, associations with preferred partners may be found. In this study, we examined the grouping patterns of a population of 80 individual giraffes in a fenced South African game reserve over a 12‐month period. Using social network analyses as a tool to evaluate observed associations, we subsequently analysed both sex‐ and season‐related grouping patterns within the study population. Mixed sex groups represented 49% of all groups observed, and although overall group composition distribution did not differ significantly between seasons, the number of encountered single females decreased by 50%, whereas the number of multi‐male groups increased by over 50% in winter. Overall average group size did not differ significantly between seasons, but significantly larger multi‐female and multi‐male groups were seen in winter. Within the social network, two distinct clusters were found in summer, with the population more divided in winter, with five distinct clusters emerging. The strongest ties (highest HWIGs) were found between adult females. Our study revealed that giraffes not only live in a highly flexible social fission‐fusion system, but also show seasonal patterns of grouping.The National Research Foundation, South Africa and a postgraduate bursary from the DST-NRF SARChI chair for Mammal behavioural ecology and physiology to (NCB).http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1469-79982019-06-01hj2018Anatomy and PhysiologyCentre for Veterinary Wildlife StudiesZoology and Entomolog

    When to Reach for the Cloud: Using Parallel Hardware for Link Discovery

    No full text
    Abstract. With the ever-growing amount of RDF data available across the Web, the discovery of links between datasets and deduplication of resources within knowledge bases have become tasks of crucial importance. Over the last years, several link discovery approaches have been developed to tackle the runtime and complexity problems that are intrinsic to link discovery. Yet, so far, little attention has been paid to the management of hardware resources for the execution of link discovery tasks. This paper addresses this research gap by investigating the efficient use of hardware resources for link discovery. We implement the HR 3 approach for three different parallel processing paradigms including the use of GPUs and MapReduce platforms. We also perform a thorough performance comparison for these implementations. Our results show that certain tasks that appear to require cloud computing techniques can actually be accomplished using standard parallel hardware. Moreover, our evaluation provides break-even points that can serve as guidelines for deciding on when to use which hardware for link discovery

    WOMBAT - a generalization approach for automatic link discovery

    No full text
    A significant portion of the evolution of Linked Data datasets lies in updating the links to other datasets. An important challenge when aiming to update these links automatically under the open-world assumption is the fact that usually only positive examples for the links exist. We address this challenge by presenting and evaluating Wombat, a novel approach for the discovery of links between knowledge bases that relies exclusively on positive examples. Wombat is based on generalisation via an upward refinement operator to traverse the space of link specification. We study the theoretical characteristics of Wombat and evaluate it on 8 different benchmark datasets. Our evaluation suggests that Wombat outperforms state-of-the-art supervised approaches while relying on less information. Moreover, our evaluation suggests that Wombat’s pruning algorithm allows it to scale well even on large datasets

    ROCKER

    No full text
    corecore